Multi-modal Dialogues as Natural User Interface for Automobile Environment

نویسندگان

  • Anurag Gupta
  • Raymond H. Lee
  • Eric H. C. Choi
چکیده

We are evolving into a mobile information age where information is expected to be accessed anywhere, anytime, wherever is needed. In this paper, we consider the issues of handling the problems of modality recognizer errors and consistency between multiple modality inputs for a multi-modal dialogue interface in the automobile environment. We describe our work on utterance verification and its application in dialogue management. Also described is the integration method in multi-modal input fusion. Our work is done in the context of an in-car telematics prototype and a brief introduction of the system is provided. INTRODUCTION With the proliferation of the internet providing access to an enormous amount of information, and the increasing popularity of various mobile devices such as mobile phone and personal digital assistant (PDA), we are evolving into a mobile information age where information is expected to be accessed anywhere, anytime, wherever it is needed. As Maes and Raman [Maes & Raman 2000] pointed out, the implication of such an mobile information age is the demand for uniform, multi-modal and dialogue driven user interface so that users can maximize their interaction with information devices. Information access systems are introduced to the car environment to enhance the driving experience. For example, the OnStar system [http://www.onstar.com] provides services ranging from call center services for security or emergency purposes to location aware services using global positioning system (GPS) and voice enabled access to web-based information. Car equipment manufacturers, like VDO Dayton [http://www.vdodayton.com] and Car Nav Systems [http://www.carnav.com], have developed car multimedia systems that integrate navigation system, car audio and even a communication system. Navigation systems with rich displays have been made available as accessories on luxury cars. However, all of them provide limited interaction with the system and are not seamlessly integrated with other existing equipment on-board. In order for such a system to integrate properly into the automobile environment and to realize a driving experience that enhances driver’s safety, security and enjoyment, it has to coordinate with existing driving system and the user in a seamless, unified manner. The car cockpit posts various challenges as a mobile environment. First, the hands-free, eyes-free environment dictates that speech is the only acceptable input modality while driving. As the environment is inherently noisy, the speech recognition accuracy will be low. Second, because of the high cognitive load already imposed on the driver for the driving “task”, any information system introduced to the automobile environment should at least not increase the driver’s cognitive load (and ideally reduces it). We propose that a user interface based on multi-modal dialogue is needed in the automobile environment. An important issue here is the proper handling of the user input. In particular, the problems of modality recognizer error and consistency between different input modalities have to be addressed in the multi-modal dialogue interface. In this paper, we present our work on utterance verification using confidence measurement and its application in the dialogue manager to control modality selection. We also present our work on multi-modal fusion in integrating semantic data from different modalities and outline our approach in resolving conflict in the modality inputs. We investigate the issues in the context of an in-car telematics prototype, called TIA. The TIA system (see Figure 1) currently implements a map-based navigation application. A multi-modal dialogue interface has been developed to allow the driver to interact with the application using either voice or Proceedings of the 9 Australian International Conference on Speech Science & Technology Melbourne, December 2 to 5, 2002.  Australian Speech Science & Technology Association Inc. Accepted after full review page 202 Anurag Gupta et. al. Multi-Modal Dialogues touch, or combination of both. Both voice and image output modalities are implemented to provide the application’s response back to the driver. Figure 1: TIA system in car TELEMATICS IN-CAR ASSISTANT (TIA) Figure 2 shows the logical architecture of the TIA system. It consists of the following components: 1. A speech recognition and understanding engine (ASU) for processing user speech input. 2. A touch input component (TIP) for processing touch screen input including the selection of buttons, gestures on the map, etc. 3. A multi-modal input fusion component (MMIF) for combining input from multiple modalities. 4. A dialogue manager (DM) for coordinating user interactions. 5. A natural language generation (NLG) component to summarize the route every time a destination has been defined. 6. A text-to-speech synthesizer (TTS) for voice output. 7. A graphic display component for displaying various GUI components such as menus, buttons, map on the touch-screen. 8. An application back-end (APP) consisting of an application module, the infotainment database and the GPS module. Each component within the architecture is implemented as an agent using SRI’s Open Agent Architecture (OAA) [Martin et. al. 1999]. Each agent was developed independently to provide a number of pre-defined services (called solvables within OAA). In other words, the multi-modal dialogue interface is made up of a multitude of agents working in a collaborative manner. Proceedings of the 9 Australian International Conference on Speech Science & Technology Melbourne, December 2 to 5, 2002.  Australian Speech Science & Technology Association Inc. Accepted after full review page 203 Anurag Gupta et. al. Multi-Modal Dialogues

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Formal Design, Veriication and Simulation of Multi-modal Dialogues

We have designed and implemented a dialogue management design tool for use in dialogue design as a component of user-interface design in multi-modal applications. The tool provides: a formal language (typed feature structures) for describing states and events; a simple rule formalism for specifying dialogues; an automatic dialogue-property checking module; a dialogue-simulator for interactive t...

متن کامل

REAL: Situated Dialogues in Instrumented Environments

We give a survey of the research project REAL, where we investigate how a system can proactively assist its user in solving different tasks in an instrumented environment by sensing implicit interaction and utilising distributed presentation media. First we introduce the architecture of our instrumented environment, which uses a blackboard to coordinate the components of the environment, such a...

متن کامل

Combining Audio and Video in Perceptive Spaces

Virtual environments have great potential in applications such as entertainment, animation by example, design interface, information browsing, and even expressive performance. In this paper we describe an approach to unencumbered, natural interfaces called Perceptive Spaces with a particular focus on efforts to include true multi-modal interface: interfaces that attend to both the speech and ge...

متن کامل

Towards a taxonomy of error-handling strategies in recognition-based multi-modal human-computer interfaces

In this paper, we survey the different types of error-handling strategies that have been described in the literature on recognition-based human–computer interfaces. A wide range of strategies can be found in spoken human–machine dialogues, handwriting systems, and multi-modal natural interfaces. We then propose a taxonomy for classifying errorhandling strategies that has the following three dim...

متن کامل

AGENT: Awareness Game Environment for Natural Training

We propose AGENT, the Awareness Game Environment for Natural Training, as a virtual environment in which serious games can be enacted. AGENT combines research on interactive storytelling, game design, turn-taking and social signal processing with a multi-modal UI in a modular fashion. Current work in progress will deliver a first demonstrable prototype within 2013.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002